fix array OOBE in blocked bloom filter when top 4 bits of hash are se… #14

richardstartin · 2019-12-23T21:49:36Z

…t (seed dependent behaviour)

richardstartin · 2019-12-23T23:05:09Z

In the blocked bloom filters was it intended that the same key would never be mapped to the same word (they now do whenever the upper 4 bits of the hash are zero), hence the plus 1? Should the solution be to increase (double?) the size of the bitset instead?

thomasmueller

See comments... Unfortunately I don't have much time currently to work on this, but I will give you access rights so you can commit yourself once the issues are addressed.

fastfilter/src/main/java/org/fastfilter/bloom/BlockedBloom.java

thomasmueller · 2019-12-24T09:05:42Z

fastfilter/src/main/java/org/fastfilter/bloom/BlockedBloom.java

 public class BlockedBloom implements Filter {

    public static BlockedBloom construct(long[] keys, int bitsPerKey) {
-        long n = keys.length;


Yes, using long doesn't make sense here... I used long in some places to avoid integer overflow, but here it's not needed.

In cases where you can reason that overflow is impossible, accumulating into a long can cause significant degradation in throughput once the loop has been compiled by C2. I can share some benchmarks which demonstrate this. Of course, sometimes it's the only safe thing to do...

Hm, I wasn't aware of this... I will keep this in mind!

thomasmueller · 2019-12-24T09:09:12Z

fastfilter/src/main/java/org/fastfilter/xor/XorSimple.java

        }
        int si = 0;
-        while (si < 2 * keys.length) {
+        while (si < 2 * keys.length && qi > 0) {


The XorSimple is (on purpose) the most simple possible implementation. It is only for educational purposes. It shouldn't have any performance improvements. But of course it shouldn't have any bugs either... So not sure if this is really needed?

This fixes a bug where qi goes negative in the loop, rather than fail to map. This occurred for the first time with the seed in the regression tests, but cannot be reproduced because of the unseeded calls to new Random().nextLong() (which #13 aims to fix). Basically there do exist cases where this loop will access Q[-1] rather than fail to map and try again, but it's nondeterministic.

I suggest you omit this change from your PR and, instead, report it as a bug. It seems likely to me that your fix (qi > 0) is just patching over a logical error of some kind. It would be better to identify the logical error.

OK I will do that.

fastfilter/src/test/java/org/fastfilter/RegressionTests.java

richardstartin · 2019-12-24T10:01:56Z

@thomasmueller I sensed this wasn't the correct fix for the blocked bloom filters so will address that.

…simple fuzzer back, ignoring MPHF and GCS2

richardstartin · 2019-12-26T19:15:02Z

@lemire do you have any time to look at this?

lemire · 2019-12-26T19:36:40Z

@richardstartin See my proposal. I think that only one change you are making is somewhat controversial in that it might be hiding a real bug instead of exposing it. I recommend you omit this change and then I think that the whole PR could be safely merged without any controversy.

thomasmueller · 2019-12-26T20:14:52Z

I would love to give you more feedback, but I'm afraid I have very little time now... Next year should be better.

…CS2, COUNTING_BLOOM

richardstartin · 2019-12-27T15:03:27Z

I will leave the branch like this - there are several false negative bugs which aren't trivial to fix, and I've reverted the change to XorSimple. SimpleFuzzer will stop crashing on XorSimple when/if it gets fixed.

lemire · 2019-12-27T16:11:23Z

@thomasmueller

I recommend merging this PR. Do you agree?

thomasmueller · 2019-12-27T16:24:21Z

I think it would break the build, see https://travis-ci.com/FastFilter/fastfilter_java/builds/142464328

What about disabling the tests for now, so the build passes, and create issues for the problems? That way I think merging is fine.

richardstartin · 2019-12-27T16:25:54Z

You shouldn't merge the PR, I've left it to break to highlight the existence of various bugs which need fixing or otherwise removing.

lemire · 2019-12-27T16:29:56Z

Ok.

My view was that it does not really break anything, it merely exposes existing problems and thus is safe to merge. But I see that Richard has opened distinct bug issues.

richardstartin · 2019-12-27T16:36:57Z

@lemire I don't think there is a rush to merge it. I was keen to help get the project to the point where it could be released and used, but there are several nondeterministic bugs I don't have enough context to be able to fix, so my goal may have been a little premature. The fixes which are made in this PR are trivial, and merging the branch will just break master. It's worth leaving around for now to verify that the harder fixes have been made.

lemire · 2019-12-27T16:37:41Z

Ok.

thomasmueller · 2019-12-27T20:46:25Z

I merged the patch, as it improves the code a lot. I will try to resolve the remaining issues.

One option is to move the buggy implementations (specially MPHF, GCS2) to the "test" source folder, until they are fixed (kind of like "staging").

richardstartin · 2019-12-27T22:19:26Z

@thomasmueller I think moving them to test makes sense, whilst there are completely new filter implementations, the availability of some filters for evaluation purposes only could be confusing for users.

lemire · 2019-12-27T22:57:09Z

+1

fix array OOBE in blocked bloom filter when top 4 bits of hash are se…

fd7c405

…t (seed dependent behaviour)

richardstartin mentioned this pull request Dec 23, 2019

Fuzz testing #15

Open

richardstartin added 4 commits December 23, 2019 22:05

record failing commit for SuccinctCountingBlockedBloom

011e9f3

fix OOBE in SuccinctCountingBlockedBloom

9cac300

capture more failures, leave GCS2 broken

9a045ad

leave GCS2 broken as it is too slow and complicated to fix

adc4d9b

richardstartin mentioned this pull request Dec 23, 2019

[proposal] implement builder API #13

Closed

thomasmueller reviewed Dec 24, 2019

View reviewed changes

change fixes to blocked bloom filters, fix OOBE bugs in cuckoo+, add …

71f78bc

…simple fuzzer back, ignoring MPHF and GCS2

richardstartin force-pushed the blocked-bloom-bug branch from fcfaab9 to 71f78bc Compare December 24, 2019 10:46

revert change to XorSimple, add false negative test cases for MPHF, G…

5100ef9

…CS2, COUNTING_BLOOM

thomasmueller merged commit 9b6c6d8 into FastFilter:master Dec 27, 2019

fix array OOBE in blocked bloom filter when top 4 bits of hash are se… #14

fix array OOBE in blocked bloom filter when top 4 bits of hash are se… #14

Uh oh!

Conversation

richardstartin commented Dec 23, 2019

Uh oh!

richardstartin commented Dec 23, 2019

Uh oh!

thomasmueller left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

thomasmueller Dec 24, 2019

Choose a reason for hiding this comment

Uh oh!

richardstartin Dec 24, 2019

Choose a reason for hiding this comment

Uh oh!

thomasmueller Dec 26, 2019

Choose a reason for hiding this comment

Uh oh!

thomasmueller Dec 24, 2019

Choose a reason for hiding this comment

Uh oh!

richardstartin Dec 24, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

lemire Dec 26, 2019

Choose a reason for hiding this comment

Uh oh!

richardstartin Dec 27, 2019

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

richardstartin commented Dec 24, 2019

Uh oh!

richardstartin commented Dec 26, 2019

Uh oh!

lemire commented Dec 26, 2019

Uh oh!

thomasmueller commented Dec 26, 2019

Uh oh!

richardstartin commented Dec 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

lemire commented Dec 27, 2019

Uh oh!

thomasmueller commented Dec 27, 2019

Uh oh!

richardstartin commented Dec 27, 2019

Uh oh!

lemire commented Dec 27, 2019 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

richardstartin commented Dec 27, 2019

Uh oh!

lemire commented Dec 27, 2019

Uh oh!

thomasmueller commented Dec 27, 2019

Uh oh!

richardstartin commented Dec 27, 2019

Uh oh!

lemire commented Dec 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

richardstartin Dec 24, 2019 •

edited

Loading

richardstartin commented Dec 27, 2019 •

edited

Loading

lemire commented Dec 27, 2019 •

edited

Loading